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VIDEO SUMMARY DESCRIPTION SCHEME AND METHOD AND SYSTEM OF 
VIDEO SUMMARY DESCRIPTION DATA GENERATION FOR EFFICIENT 
OVERVIEW AND BROWSING 

TECHNICAL FIELD 

5 The present invention relates to a video summary description scheme for 

efficient video overview and browsing, and also relates to a method and system of video 
summary description generation to describe video summary according to the video 
summary description scheme. 

The technical fields in which the present invention is involved are content- 
10 based video indexing and browsing/searching and summarizing video to the content and 
then describing it. 

BACKGROUND OF THE INVENTION 

The format of summarizing video largely falls into dynamic summary and 
static summary. The video description scheme according to the embodiments of the present 
15 invention is for efficiently describing the dynamic summary and the static summary in the 
unification-based description scheme. 

Generally, because the existing video summary and "description scheme 
provide simply the information of video interval which is included in the video summary, 
the existing video summary and description scheme are limited to conveying overall video 
20 contents through the playing of the video summary. 

However, in many cases, the browsing for identifying and revisiting 
concerned parts through overview of overall contents is needed rather than only overview 
of overall contents through the video summary. 

Also, the existing video summary provides only the video interval which is 
25 considered to be important according to the criteria determined by the video summary 
provider. Accordingly, if the criteria of users and the video provider are different from 
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each other or users have special criteria, the users cannot obtain the video summary they 
desire. 

That is, although the existing video summary permits the users selecting the 
video summary with a desired level by providing several levels' video summary, it makes 
5 the selecting extent of the users limited so that the users cannot select by the contents of the 
video summary. 

The US patent 5,821,945 entitled "Method and apparatus for video browsing 
based on content and structure" represents video in compact form and provides browsing 
functionality accessing to the video with desired content through the representation. 

1° However, the patent pertains to static summary based on the representative 

frame, and although the existing static summary summarizes by using the representative 
frame of the video shot, the representative frame of this patent provides only visual 
information representing the shot. The patent has a limitation on conveying the information 
using the summary scheme. 

15 As compared with the patent, the video description scheme and browsing 

method of the embodiments described herein utilize the dynamic summary based on the 
video segment. 

The video summary description scheme was proposed by the MPEG-7 
Description Scheme (V0.5) announced ISO/IEC JTC1/SC29/WG11 MPEG-7 Output 
20 Document No. N2844 on July 1 999. Because the scheme describes the interval information 
of each video segment of dynamic video summary, in spite of providing basic 
functionalities describing dynamic summary, the scheme has problems in the following 
aspects. 

First, there is the drawback that it cannot provide access to the original 
25 video from summary segments constituting the video summary. That is, when users want to 
access the original video to understand more detailed information on the basis of the 
summary contents and overview through video summary, the existing scheme cannot meet 
the need. 
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Secondly, the existing scheme cannot provide sufficient audio summary 
description functionalities. 

And finally, there is the drawback that in the case of representing event- 
based summary, the duplicate description and the complexity of searching is indispensable. 

5 SUMMARY OF THE INVENTION 

The disclosed embodiments of the present invention provide a hierarchical 
video summary description scheme, which comprises the representative frame information 
and the representative sound information at each video interval that is included in the video 
summary and makes feasible the user-customized event-based summary providing the 
10 users' selection for the contents of the video summary and efficient browsing, and a video 
summary description data generation method and system using the description scheme. 

In order to achieve the foregoing, the Hierarchical Summary DS according to 
an executable example of the present invention comprises at least one HighlightLevel DS, 
which is describing highlight level, and the HighlightLevel DS comprises at least a 
15 HighlightSegment DS, which is describing highlight segment information constituting the 
video summary of the highlight level. 

Preferably, the HighlightLevel DS is composed of at least one lower level 
HighlightLevel DS's. 

More preferably, the HighlightSegment DS comprises a 
20 VideoSegmentLocator DS, which is describing time information or video itself of the 
corresponding highlight segment. 

It is preferable that the HighlightSegment DS further comprises 
ImageLocator DS, which is describing the representative frame of the corresponding 
highlight segment. 

25 It is more preferable that the HighlightSegment DS further comprises 

SoundLocator DS, which is describing the representative sound information of said 
corresponding highlight segment. 



3 



Preferably, the Highlights egment DS further comprises ImageLocator DS, 
which is describing the representative frame of the corresponding highlight segment, and 
SoundLocator DS, which is describing the representative sound information of the 
corresponding highlight segment. 
5 More preferably, the ImageLocator DS describes time information or image 

data of the representative frame of video interval corresponding to the corresponding 
highlight segment. 

Preferably, the Highlights egment DS further comprises 
AudioSegmentLocator DS, which is describing the audio segment information constituting 
10 an audio summary of the corresponding highlight segment. 

More preferably, the AudioSegmentLocator DS describes time information 
or audio data of the audio interval of the corresponding highlight segment. 

It is preferable that the HierarchicalSummary DS include 
SummaryComponentList describing and enumerating all of the SummaryComponehtTypes 
15 that is included in the HierarchicalSummary DS. 

Also, it is preferable that the HierarchicalSummary DS include 
SummaryThemeList DS, which is enumerating the event or subject comprised in the 
summary and describing the ID and then describes event based summary and permits the 
users to browse the video summary by the event or subject described in the 
20 SummaryThemeList 

It is more preferable that the SummaryThemeList DS include an arbitrary 
number of SummaryThemes as elements and the SummaryTheme includes an attribute of 
id representing the corresponding event or subject, and the SummaryTheme further 
includes an attribute of parentID which is to describe the id of the event or subject of the 
25 upper level. 

Preferably, the HighlightLevel DS includes an attribute of themelds 
describing the attribute of ids of common events or subjects if all of the HighlightSegments 
and HighlightLevels which are constituting the corresponding highlight level have common 
events or subjects. 
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More preferably, the HighlightSegment DS includes an attribute of themelds 
describing the attribute of id and describes the event or subject of the corresponding 
highlight segment. 

Also, according to the present invention, a computer-readable recording 
5 medium where a Hierarchical Summary DS is stored therein is provided. Preferably, the 
HierarchicalSummary DS comprises at least one HighlightLevel DS, which is describing 
the highlight level, and the HighlightLevel DS comprises at least one HighlightSegment 
DS, which is describing highlight segment information constituting the video summary of 
that the highlight level, and the HighlightSegment DS comprises Video SegmentLocator DS 
1 0 describing time information or video itself of the corresponding highlight segment. 

Also, according to the embodiments of the present invention, a method for 
generating video summary description data according to video summary description 
scheme by inputting original video is provided. The method includes the following steps: a 
video analyzing step, which is producing video analysis result by inputting the original 
15 video and then analyzing the original video; a summary rule defining step, which is 
defining the summary rule for selecting video summary interval; a video summary interval 
selecting step, which constitutes video summary interval information by selecting the video 
interval capable of summarizing video contents from the original video by inputting the 
original video analysis result and the summary rule; and a video summary describing step, 
20 which is producing video summary description data according to the HierarchicalSummary 
DS by inputting the video summary interval information output by the video summary 
interval selecting step. 

Preferably, the video analyzing step comprises a feature extracting step, 
which is outputting the types of features and video time interval at which those features are 
25 detected by inputting the original video and extracting those features, an event detecting 
step, which is detecting key events included in the original video by inputting the types of 
features and video time interval at which those features are detected; and an episode 
detecting step, which is detecting an episode by dividing the original video into a story 
flow base on the basis of the detected event: 
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Preferably, the summary rule defining step provides the types of summary 
events, which are bases in selecting the video summary interval, after defining them to the 
video summary describing step. 

More preferably, the method further comprises a representative frame 
5 extracting step, which is providing the representative frame to the video summary 
describing step by inputting the video summary interval information and extracting 
representative frame. 

More preferably, the method further comprises a representative sound 
extracting step, which is providing the representative sound to the video summary 

10 describing step by inputting the video summary interval information and extracting 
representative sound. 

Also, according to the embodiments of the present invention, a computer- 
readable recording medium where a program is stored therein is provided. The program 
executes the following steps: a feature extracting step, which is outputting the types of 

15 features and video time interval at which those features are detected; an event detecting 
step, which is detecting key events included in the original video by inputting the types of 
features and the video time interval at which those features are detected; an episode 
detecting step, which is detecting an episode by dividing the original video into a story 
flow base on the basis of the detected key events; a summary rule defining step, which is 

20 defining the summary rule for selecting the video summary interval; a video summary 
interval selecting step, which is constituting a video summary interval information by 
selecting the video interval capable of summarizing the video contents of the original video 
by inputting the detected episode and the summary rule; and a video summary describing 
step, which is generating video summary description data with HierarchicalSummary DS 

25 by inputting the video summary interval information output by the video summary interval 
selecting step. 

Also, according to the present invention, a system for generating video 
summary description data according to video summary description scheme by inputting 
original video is provided. The system includes video analyzing means for outputting a 
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video analysis result by inputting original video and analyzing the original video, summary 
rule defining means for defining the summary rule for selecting the video summary interval, 
video summary interval selecting means for constituting video summary interval 
information by selecting the video interval capable of summarizing the video contents of 
5 the original video by inputting the video analysis result and the summary rule, and video 
summary describing means for generating video summary description data with 
HierarchicalSummary DS by inputting the video summary interval information output by 
the video summary interval selecting means. 

Preferably, the HierarchicalSummary DS comprises at least one 

10 HighlightLevel DS, which is describing highlight level, the HighlightLevel DS comprises 
at least one HighlightSegment DS, which is describing highlight segment information 
constituting the video summary of the highlight level, and the HighlightSegment DS 
comprises VideoSegmentLocator DS describing time information or the video itself of the 
corresponding highlight segment. 

1 5 Preferably, the video analyzing means comprises feature extracting means 

for outputting the types of features and video time interval at which those features are 
detected by inputting the original video and extracting those features, event detecting 
means for detecting key events included in the original video by inputting the types of 
features and video time interval at which those features are detected; and episode detecting 

20 means for detecting episode by dividing the original video into story flow base oh the basis 
of the detected event. 

More preferably, the summary rule defining means provides the types of 
summary events, which are bases in selecting the video summary interval, after defining 
them to the video summary describing means. 

25 It is preferable that the system further comprise representative frame 

extracting means for providing the representative frame to the video summary describing 
means by inputting the video summary interval information and extracting representative 
frame. 
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It is more preferable that the system further comprise representative sound 
extracting means for providing the representative sound to the video summary describing 
means by inputting the video summary interval information and extracting representative 
sound. 

5 Also, according to the embodiments of the present invention, a computer- 

readable recording medium where a program is stored therein is provided. The program is 
for functioning feature extracting means for outputting the types of features and video time 
interval at which those features are detected, event detecting means for detecting key 
events included in the original video by inputting the types of features and the video time 

10 interval at which those features are detected, episode detecting means for detecting episode 
by dividing the original video into story flow base on the basis of the detected key events, 
summary rule defining means for defining the summary rule for selecting the video 
summary interval, video summary interval selecting means for constituting video summary 
interval information by selecting the video interval capable of summarizing the video 

15 contents of the original video by inputting the detected episode and the summary rule, and 
video summary describing means for generating video summary description data with 
Hierarchical Summary DS by inputting the video summary interval information output by 
the video summary interval selecting step. 

Also, a Video browsing system in a server/client circumstance according to 

20 the present invention is provided. The system includes a server that is equipped with video 
summary description data generation system which generates video summary description 
data on the basis of HierarchicalSummary DS by inputting original video and links the 
original video and video summary description data, and a client that is browsing and 
navigating video by overview of the original video and access to the original video of the 

25 server using the video summary description data. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The embodiments of the present invention will be explained with reference 
to the accompanying drawings, in which: 
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FIG. 1 is a block diagram illustrating a system for generating video 
summary description data according to the description scheme of the present invention. 

FIG. 2 is a drawing that illustrates the data structure of the 
HierarchicalSummary DS describing the video summary description scheme according to 
5 the present invention in UML (Unified Modeling Language). 

FIG. 3 is a compositional drawing of a user interface of the tool for playing 
and browsing of the video summary inputting the video summary description data 
described by the same description scheme as FIG. 2. 

FIG. 4 is a compositional drawing for the flow of the data and control for 
10 hierarchical browsing using the video summary of the present invention. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention will be described in detail by way of a preferred 
embodiment with reference to accompanying drawings, in which like reference numerals 
are used to identify the same or similar parts. 
15 FIG. 1 is a block diagram illustrating a system for generating video 

summary description data according to the description scheme of the present invention. 

As illustrated in FIG. 1, the apparatus for generating video description data 
according to an embodiment of the present invention is composed of a feature extracting 
part 101, an event detecting part 102, an episode detecting part 103, a video summary 
20 interval selecting part 104, a summary rule defining part 105, a representative frame 
extracting part 106, a representative sound extracting part 107 and a video summary 
describing part 108. 

The feature extracting part 101 extracts necessary features to generate video 
summary by inputting the original video. The general features include shot boundary, 
25 camera motion, caption region, face region and so on. 

In the step of extracting features, the types of features and video time 
interval at which those features are detected are output to the step of detecting event in the 
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format of (types of features, feature serial number, time interval) by extracting those 
features. 

For example, in the case of camera motion, (camera zoom, 1, 100 ~ 150) 
represents the information that the first zoom of camera was detected in the 100 ~ 150 
5 frame. 

The event detecting part 102 detects key events that are included in the 
original video. Because these events must represent the contents of the original video well 
and are the references for generating video summary. These events are generally differently 
defined according to genre of the original video. 
10 These events either may represent higher meaning level or may be visual 

features that can directly infer higher meaning. For example, in the case of soccer video, 
goal, shoot, caption, replay and so on can be defined as events. 

The event detecting part 102 outputs the types of detected events and the 
time interval in the format of (types of events, event serial number, time interval). For 
1 5 example, the event information indicating that the first goal occurred at between 200 and 
300 frame is output in the format of (goal, 1 , 200 ~ 300). 

The episode detecting part 103, on the basis of the detected event, divides 
the video into an episode with a larger unit than an event based on the story flow. After 
detecting key events, an episode is detected while including accompanied events that 
20 follow the key event. For example, in the case of soccer video, the goal and shoot can be 
key events and the bench scene, audiences scene, goal ceremony scene, replay of goal 
scene and so on compose accompanied events of the key events. 

That is, the episode is detected on the basis of the goal and shoot. 

The episode detection information is output in the format of (episode 
25 number, time interval, priority, feature shot, associated event information). Herein, the 
episode number is a serial number of the episode and the time interval represents the time 
interval of the episode by the shot unit. The priority represents the degree of importance of 
the episode. The feature shot represents the shot number including the most important 
information out of the shots comprising the episode and the associated event information 
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represents the event number of the event related to the episode. For example, in the case of 
representing the episode detection information as (episode 1,4-6, 1,5, goal 1, caption 3), 
the information means that the first episode includes 4 - 6th shot, the priority is the highest 
(1), the feature shot is the fifth shot, and the associated events are the first goal and the 
5 third caption. 

The video summary interval selecting part 104 selects the video interval at 
which the contents of the original video can be summarized well on the basis of the 
detected episode. The reference of selecting the interval is performed by the predefined 
summary rule of the summary rule defining part 105. 

10 The summary rule defining part 105 defines rule for selecting the summary 

interval and outputs control signal for selecting the summary interval. The summary rule 
defining part 105 also outputs the types of summary events, which are bases in selecting 
the video summary interval, to the video summary describing part 1 08. 

The video summary interval selecting part 104 outputs the time information 

15 of the selected video summary intervals by frame units and outputs the types of events 
corresponding to the video intervals. That is, the format of (100 - 200, goal), (500 - 700. 
shoot) and so on represent that the video segments selected as the video summary intervals 
are 100 - 200 frame, 500 - 700 frame and so on and the event of each segment is goal and 
shoot respectively. As well, the information such as file name can be output to facilitate the 

20 access of an additional video, which is composed of only the video summary interval. 

If the video summary interval selection is completed, the representative 
frame and the representative sound are extracted from the representative frame extracting 
part 106 and the representative sound extracting part 107 respectively by using the video 
summary interval information. 

25 The representative frame extracting part 106 outputs the image frame 

number representing the video summary interval or outputs the image data. 

The representative sound extracting part 107 outputs the sound data 
representing the video summary interval or outputs the sound time interval. 
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The video summary describing part 108 describes the related information in 
order to make efficient summary and browsing functionalities to be feasible according to 
the Hierarchical Summary Description Scheme of the present invention shown in FIG. 2. 

The main information of the Hierarchical Summary Description Scheme 
5 comprises the types of summary events of the video summary, the time information 
describing each video summary interval, the representative frame, the representative sound, 
and the event types in each interval. 

The video summary describing part 108 outputs the video summary 
description data according to the description scheme illustrated in FIG. 2. 
10 FIG. 2 is a drawing that illustrates the data structure of the 

HierarchicalSummary DS describing the video summary description scheme according to 
the present invention in UML (Unified Modeling Language). 

The HierarchicalSummary DS 201 describing the video summary is 
composed of one or more HighlightLevel DS 202 and one or zero SummaryThemeList DS 
15 203. 

The SummaryThemeList DS provides the functionality of the event based 
summary and browsing by enumeratively describing the information of subject or event 
constituting the summary. The HighlightLevel DS 202 is composed of the 
HighlightSegment DSs 204 as many as the number of the video intervals constituting the 
20 video summary of that level and zero or several number of HighlightLevel DS. 

The HighlightSegment DS describes the information corresponding to the 
interval of each video summary. The HighlightSegment DS is composed of one 
VideoSegmentLocator DS 205, zero or several ImageLocator DSs 206, zero or several 
SoundLocator DSs 207 and AudioSegmentLocator 208. 
25 The following give more detailed description about the 

HierarchicalSummary DS. 

The HierarchicalSummary DS has an attribute of SummaryComponentList, 
which obviously represents the summary type and which is comprised of the 
HierarchicalSummary DS. 
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The SummaryComponentList is derived on the basis of the 
SummaryComponentType and describes by enumerating all comprised 
SummaryComponentTypes. 

In the SummaryComponentList, there are five types, such as keyFrames, 
5 keyVideoClips, keyAudioClips, keyEvents, and unconstraint. 

The keyFrames represents the key frame summary composed of 
representative frames. The keyVideoClips represents the key video clip summary 
composed of key video intervals' sets. The keyEvents represents the summary composed of 
the video interval corresponding to either the event or the subject. The keyAudioClips 
10 represents the key audio clip summary composed of representative audio intervals' sets. 
And, the unconstraint represents the types of summary defined by users except for the 
summaries. 

Also, in order to describe the event-based summary, the 
HierarchicalSummary DS might comprise the SummaryThemeList DS which is 
1 5 enumerating the event (or subject) comprised in the summary and describing the ID. 

The SummaryThemeList has arbitrary number of Summary Themes as 
elements. The SummaryTheme has an attribute of id of ID type and selectively has an 
attribute of parentld. 

The SummaryThemeList DS permits the users browsing the video summary 
20 from the viewpoint of each event or several subjects described in the SummaryThemeList. 
That is, the application tool inputting description data makes the user select the desired 
subject by parsing the SummaryThemeList DS and providing the information to the user. 

At this time, in the case of enumerating these subjects into simple format, if 
the number of the subjects is large, it might not be easy to find out the subject desired by 
25 the users. 

Accordingly, by representing the subject as a tree structure similar to ToC 
(Table of Content), the users efficiently can do browsing at each subject after finding out 
the desired subject. 
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In order to do so, the embodiments of the present invention permit the 
attribute of parentld being selectively used in the SummaryTheme. The parentld means the 
upper element (upper subject) in the tree structure. 

The HierarchicalSummary DS of the present invention comprises 
5 HighlightLevel DSs, and each HighlightLevel DS comprises one or more 
HighlightSegment DS, which corresponds to a video segment (or interval) constituting the 
video summary. 

The HighlightLevel DS has an attribute of themelds of IDREFS type. 

The themelds describes the subject and event id, common to the children 
10 HighlightLevel DS of corresponding HighlightLevel DS or all HighlightSegment DSs 
comprised in the HighlightLevel, and the id is described in the SummaryThemeList DS. 

The themelds can denote several events and, when doing event based 
summary, solve the problem that same id is unnecessarily repeated in all segments 
constituting the level by having the themelds representing common subject type in the 
15 HighlightSegment constituting the level. 

The HighlightSegment DS comprises one VideoSegmentLocator DS and 
one or more ImageLocator DS, zero or one SoundLocator DS and zero or one 
AudioSegmentLocator DS. 

Herein, the VideoSegmentLocator DS describes the time information or 
20 video itself of the video segment constituting the video summary. The ImageLocator DS 
describes the image data information of the representative frame of the video segment. The 
SoundLocator DS describes the sound information representing the corresponding video 
segment interval. The AudioSegmentLocator DS describes the interval time information of 
the audio segment constituting the audio summary or the audio information itself. 
25 The HighlightSegment DS has an attribute of themelds. The themelds 

describes using the id defined in the SummaryThemeList which subjects or events 
described in the SummaryThemeList DS relates to the corresponding highlight segment. 

The themelds can denote more than one event, and by allowing one 
highlight segment to have several subjects, it is an efficient technique of the present 
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invention which is solving the problem of indispensable duplication of descriptions caused 
by describing the video segment at each event (or subject) when using the existing method 
for event-based summary. 

When describing the highlight segment constituting the video summary, in a 
5 different way from the existing hierarchical summary description scheme describing only 
the time information of the highlight video interval, in order to describe the video interval 
information of each highlight segment, the representative frame information and the 
representative sound information, by placing the Video SegmentLocator DS, the 
ImageSegmentLocator DS and the SoundLocator DS, the present invention makes the 
10 overview through the highlight segment video and the navigation and browsing utilizing 
the representative frame and the representative sound of the segment to be feasible to 
efficiently utilize through the introduction of the HighlightSegment DS for describing the 
highlight segment constituting the video summary. 

By placing the SoundLocator DS capable of describing the representative 
1 5 sound corresponding to the video interval, in real instances through the characteristic sound 
capable of representing the video interval, for example gun shot, outcry, anchor's comment 
in soccer (for example, goal and shoot), actors' name in drama, specific word, etc., it is 
possible to do efficient browsing by roughly understanding whether the interval is an 
important interval containing the desired contents or what contents are contained in the 
20 interval within a short time without playing the video interval. 

FIG. 3 is a compositional drawing of a user interface of the tool for playing 
and browsing of the video summary inputting the video summary description data 
described by the same description scheme as FIG. 2. 

The video playing part 301 plays the original video or the video summary 
25 according to the control of the user. The original video representative frame part 305 shows 
the representative frames of the original video shots. That is, it is composed of a series of 
images with reduced sizes. 

The representative frame of the original video shot is described not by the 
Hierarchical Summary DS of the present invention but by additional description scheme 
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and can be utilized when both the description data are provided along with the summary 
description data described by the HierarchicalSummary DS of the present invention. 

The user accesses to the original video shot corresponding to the 
representative frame by clicking the representative frame. 
5 The video summary level 0 representative frame part and the representative 

sound part 307 and the video summary level 1 representative frame part and the 
representative sound part 306 shows the frame and sound information representing each 
video interval of the video summary level 0 and the video summary level 1 respectively. 
That is, it is composed of the iconic images representing a series of the images and sounds 

10 with reduced sizes. 

If the user clicks the representative frame of the video summary 
representative frame part and the representative sound part, the user accesses to the original 
video interval corresponding to the representative frame. Herein, in the case of clicking the 
representative sound icon corresponding to the representative frame of the video summary, 

15 the representative sound of the video interval is played. 

The video summary controlling part 302 inputs the control for user selection 
to play the video summary. In the case of being provided with the multi-level video 
summary, the user does overview and browsing by selecting the summary of the desired 
level through the level selecting part 303. The event selecting part 304 enumerates the 

20 event and the subject provided by the Summary ThemeList and the user does overview and 
browsing by selecting the desired event. After all, this realizes the summary of the user 
customization type. 

FIG. 4 is a compositional drawing for the flow of the data and control for 
hierarchical browsing using the video summary of the present invention. 

25 The browsing is performed by accessing the data for browsing with the 

method of FIG. 4 through the use of the user interface of FIG. 3. The data for browsing are 
the video summary and the representative frame of the video summary and the original 
video 406 and the original video representative frame 405. 
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The video summary is assumed to have two levels. Needless to say, the 
video summary may have more levels than two. The video summary level 0 401 is what is 
summarized with shorter time than the video summary level 1 403. That is, the video 
summary level 1 contains more contents than the video summary level 0. The video 
summary level 0 representative frame 402 is the representative frame of the video summary 
level 0 and the video summary level 1 representative frame 404 is the representative frame 
of the video summary level 1 . 

The video summary and the original video are played through the video 
playing part 301 shown in FIG. 3. The video summary level 0 representative frame is 
displayed in the video summary level 0 representative frame and the representative sound 
part 306, the video summary level 1 representative frame is displayed in the video 
summary level 1 representative frame and the representative sound part 307, and the 
original video representative frame is displayed in the original video representative frame 
part 305. 

The hierarchical browsing method illustrated in FIG. 4 can have various 
types of hierarchical paths as the following example. 
Case 1 : (l)-(2) 
Case 2 : (l)-(3)- (5) 
Case 3: (1) - (3) - (4) - (6) 
Case 4 : (7) - (5) 
Case 5 : (7) -(4) -(6) 
The overall browsing scheme is as follows. 

First, understand the overall contents of the original video by watching the 
video summary of the original video. Herein, the video summary may play either the video 
summary level 0 or the video summary level 1 . When more detailed browsing is wanted 
after watching the video summary, the interested video interval is identified through the 
video summary representative frame. If the scene which is desired to be exactly found, is 
identified in the video summary representative frame, play it by directly accessing to the 
video interval of the original video to which the representative frame is connected. And if 
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the more detailed information is needed, the user may access the desired original video 
either by understanding the representative frame of the next level or by hierarchically 
understanding the contents of the representative frame of the original video. 

Although these hierarchical browsing techniques might take a long time in 
5 browsing to access the desired contents while the original video is being played, the 
browsing time is substantially reduced by directly accessing the contents of the original 
video through the hierarchical representative frame. 

The existing general video indexing and browsing techniques divide the 
original video in shot unit and access to the shot by perceiving the desired shot from the 
1 0 representative frame after constituting the representative frame representing each shot. 

In this case, because the number of shots in the original video is large, 
substantial time and efforts are necessary to do browsing the desired contents out of many 
representative frames. 

In the present invention, it is feasible to quickly access the desired video by 
1 5 constituting the hierarchical representative frame with the representative frame of the video 
summary. 

The case 1 is the case that plays the video summary level 0 and directly 
accesses to the original video from the video summary level 0 representative frame. 

The case 2 is the case that plays the video summary level 0 and selects the 
20 most interested representative frame from the video summary level 0 representative frame 
and identifies the desired scene in the video summary level 1 representative frame 
corresponding to the neighborhood of the representative frame to understand more detailed 
information before access to the original video and then accesses to the original video. 

The case 3 is the case that selects the most interested representative frame to 
25 obtain more detailed information in the case that the access from the video summary level 
1 representative frame to the original video is difficult in the case 2 and by the original 
video representative frames neighboring the representative frame identifies the desired 
scene and then accesses to the original video using the representative frame of the original 
frame. 
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The case 4 and case 5 are the cases that start at the playing of the video 
summary level 1 and the paths are similar to the above cases. 

When applied to the server/client circumstance, the present invention can 
provide a system in which multiple clients can access one server and do video overview 

5 and browsing. The original video is inputted to the server and the video summary 
description data is produced on the basis of the hierarchical summary description scheme 
and the video summary description data generation system linking the original video and 
the video summary description data is equipped. The client accesses the server through the 
communication network, does overview of the video using the video summary description 

10 data, and does browsing and navigation of the video by accessing to the original video. 

Although, the present invention was described on the basis of preferably 
executable examples, these executable examples do not limit the present invention but 
exemplify. Also, it will be appreciated by those skilled in the art that changes and 
variations in the embodiments herein can be made without departing from the spirit and 

15 scope of the present invention as defined by the following claims and the equivalents 
thereof. 
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